Final Project: Exploring the Relationship Between COVID Vaccination and Flu Rates in 57 California Counties, from Q4-2022 to Q2-2023

Author

Ngan Nguyen, Nicole Fernandez, Shirley Sui

Problem Statement

There is a growing interest in understanding the potential impact of COVID-19 vaccinations on existing respiratory illnesses like the seasonal flu. Understanding whether or where correlation exists between COVID-19 vaccination and flu incidence and severity can help inform the use of public health flu prevention resources. In this project, we explore the relationship between COVID-19 vaccination and flu incidence rate and severity rate over time, geography, and age group using COVID vaccination data and simulated flu data obtained from the California Department of Public Health (CDPH) to address three main questions:

  1. Do COVID vaccination rates reflect flu vaccination rates?
  2. Is there any correlation between COVID vaccination rates and flu severity?
  3. Does COVID vaccination affect flu severity within each age group?

Methods

Data Considerations

Data collection was organized into quarters. Specifically, quarter 4 (Q4) spans October to December, quarter 1 (Q1) encompasses January to March, and quarter 2 (Q2) covers April to June. The quarterly breakdown captures the peak flu activity typically observed during the colder months. Q4-Q2 (October through June of the following year) are of particular interest, as they encapsulate the period when influenza transmission is historically elevated. This is also likely why the simulated flu data only contained data for Q4-Q2.

Data Cleaning

Simulated flu data was obtained from the CDPH in two parts: one dataset with all counties besides Los Angeles County and one dataset with LA County data alone. Both datasets were examined prior to data cleaning, which included renaming columns to match between the datasets, converting columns to date or factor as appropriate, adding a county column to the LA County dataset (so it could be joined to the main California dataset), and recoding race and ethnicity data in the California dataset. Since the California data codebook contained a warning not to use dt_diagnosis in data analysis, the LA County data did not have dt_report values, and converting dt_dx in the LA County data to dt_report was inefficient, we converted dt_report from the California dataset and dt_dx from the LA County dataset to quarters. The LA County data was added to the California dataset using bind_rows, after which the data was grouped by age_category, county, and quarter to summarize the per-strata variables by summing dx_new and severe_new, which we used to calculate new diagnosis rates (sum of new cases / susceptible population) and flu severity rate (sum of new severe cases / sum of new cases).

COVID vaccination data obtained from CDPH was cleaned thoroughly to match the morbidity datasets. This involved deleting duplicate values, renaming columns and values to match with the simulated flu datasets, pivot_wider on the demographic_category column to split each demographic type into a separate column, and reclassifying the age column (specifically, combining “under 5”, “5-11”, and “12-17” into one category of “0-17”) to match the flu data. A percent vaccinated variable was created and defined as cumulative fully vaccinated / estimated population; this variable was then used to calculate mean vaccination rate per strata after the data was grouped by age_category, county, and quarter.

The COVID vaccination data was filtered to Q4 2022 to Q2 2023, as the flu data only contained data for these quarters. Then, flu data and COVID vaccination data were joined using inner_join by age_category, county, and quarter, after which NA values and the sum_dx_new and sum_severe_new columns were dropped. This final dataset was used to create Tables 1-2 and Figures 1-2.

Do COVID vaccination rates reflect flu vaccination rates?

Figure 1 was created by grouping the data by county and quarter to calculate a mean % vaccinated and mean flu incidence rate, which was then plotted against each other, with COVID vaccination as the independent variable, on a scatter plot with a best-fit line.

Is there any correlation between COVID vaccination rates and flu severity?

To create Table 1, the data was grouped by county to calculate mean values for % COVID-19 vaccination achieved and the severe flu rate of each county. These values were then used to calculate the correlation between COVID vaccination and flu severity in each county.

Does COVID vaccination affect flu severity within each age group?

Table 2 was created similarly; data was grouped by age_category, the stratum of interest for this question, and mean % COVID-19 vaccination and mean severe flu rate were calculated for each age category. Then, these values were used to calculate the correlation between COVID vaccination and flu severity within each age category.

Figure 2 was also created by plotting mean flu severity rate against mean % vaccinated for COVID (independent variable) in a scatter plot faceted by age category. A best fit line was added to each facet to better assess the presence or absence of an association between the two variables.

Results

Table 2: High COVID vaccination % not significantly correlated with low severe flu rate within age categories, Q4-2022 - Q2-2023
Age Category Mean % Vaccinated for COVID Mean Severe Flu Rate Correlation
0-17 0.3537977 0.0012448 -0.05
18-49 0.6444651 0.0081145 0.00
50-64 0.7660298 0.0392774 0.05
65+ 0.8302451 0.1031616 -0.02

Discussion

Our investigation of the relationship between COVID vaccination and severe flu morbidity across different age categories in California reveals intriguing insights.

However, it’s crucial to recognize that this analysis pertains to aggregate-level data, and caution should be exercised in generalizing these findings to individual-level dynamics to avoid the ecological fallacy.

As seen in Figure 1, analysis of the COVID vaccination rate data and simulated flu data indicates that COVID vaccination does not seem to strongly reflect flu vaccination within the 57 California counties from which data was collected/simulated. Additional measures to promote flu vaccination may be needed.

Contrary to the implication that high COVID-19 vaccination counties would have low severe flu rates, the data in Table 1 show no such trends: no correlation exists between the mean percentage of COVID-19 vaccination and the severity of flu cases in the studied counties.

As seen in Table 2 and Figure 2, the age-stratified analysis examining the mean percentage of COVID vaccination alongside mean severe flu rates across distinct age categories highlights that while higher COVID vaccination rates are observed in older age groups, the mean severe flu rates do not consistently decrease with increasing vaccination rates. The 65+ age category, characterized by the highest COVID vaccination rate, exhibits a higher mean severe rate than anticipated.